Yet Another Resource Negotiator Research Articles

MapReduce has been widely used in Hadoop for parallel processing larger-scale data for the last decade. However, remote-sensing (RS) algorithms based on the programming model are trapped in dense disk I/O operations and unconstrained network communication, and thus inappropriate for timely processing and analyzing massive, heterogeneous RS data. In this paper, a novel in-memory computing framework called Apache Spark (Spark) is introduced. Through its merits of transferring transformation to in-memory datasets of Spark, the shortages are eliminated. To facilitate implementation and assure high performance of Spark-based algorithms in a complex cloud computing environment, a strip-oriented parallel programming model is proposed. By incorporating strips of RS data with resilient distributed datasets (RDDs) of Spark, all-level parallel RS algorithms can be easily expressed with coarse-grained transformation primitives and BitTorrent-enabled broadcast variables. Additionally, a generic image partition method for Spark-based RS algorithms to efficiently generate differentiable key/value strips from a Hadoop distributed file system (HDFS) is implemented for concealing the heterogeneousness of RS data. Data-intensive multitasking algorithms and iteration-intensive algorithms were evaluated on a Hadoop yet another resource negotiator (YARN) platform. Experiments indicated that our Spark-based parallel algorithms are of great efficiency, a multitasking algorithm took less than 4 h to process more than half a terabyte of RS data on a small YARN cluster, and 9*9 convolution operations against a 909-MB image took less than 260 s. Further, the efficiency of iteration-intensive algorithms is insensitive to image size.

Big DataVol. 2, No. 1 Industry ColumnOpen AccessBig Data 2.0: Cataclysm or Catalyst?Mike HoskinsMike HoskinsSearch for more papers by this authorPublished Online:14 Mar 2014https://doi.org/10.1089/big.2014.1519AboutSectionsPDF/EPUB ToolsPermissionsDownload CitationsTrack CitationsAdd to favorites Back To Publication ShareShare onFacebookTwitterLinked InRedditEmail IntroductionFour undeniable trends shape the way we think about data, big or small. While managing big data is ripe with challenges, it continues to represent more than $15 trillion in untapped value. New analytics and next-generation platforms hold the key to unlocking that value. Understanding these four trends brings light to the current shift taking place from Big Data 1.0 to 2.0, a cataclysmal shift for those who fail to see it and take action, and a catalyst for those who embrace the new opportunity.The world has changedWe now live in the digital age. Everything is made up of bytes and pixels. Billions of devices and apps create a constant flow of data, where all things digital are connected by the internet of things.The world will keep changingTechnology innovation is exploding on four vectors all at the same time. The internet connects everything, everywhere. Mobile device adoption outpaces traditional desktop and laptop computing, while the cloud is taking over the corporate data center—and big data continues to explode.Time is the new gold standardAs the speed of change continues to accelerate, time becomes the most valuable commodity. Speed becomes an unfair advantage to those who possess it, and anyone who wants to survive must combine speed with intelligence to take advantage of time.Data is the new currencyHe who has the data wins in the end. Hidden in the massive quantities of data are the secrets to attaining more customers and keeping them, shaping their behavior, and minimizing enterprise risk at levels never before possible. This creates a currency more stable than the dollar, the pound, or the yen.Big Data 1.0: A Laboratory Science ExperimentIt is in the midst of these four trends that we have exploded into the age of big data. Big Data 1.0 was all about introducing new technologies to take advantage of all the new data being created. One of the frontrunners that has emerged with great promise is Apache Hadoop, which has gone from being an open source phenomenon to a common household name.In the course of this transition, five vital advancements have been made: 1. Businesses are now capable of enormous, affordable scale compared with old storage paradigms, thanks to the inherent parallelism in modern commodity hardware.2. We can capture and store all data without first determining its value.3. New opportunities for analytics are coming from the new types of data being stored.4. Data discovery and data provisioning is now common in most organizations.5. Analytics applications are the key to opening the door to big data opportunity.Yet, even with all of this advancement, most big data projects are stuck in the laboratory. Millions have been invested and the executives behind the investment are still waiting on their return on investment. So what is the problem with big data right now? 1. It is extremely complex to integrate, deploy, operate, and manage.2. It requires specialty programmer skillsets to deploy the system and process data.3. It lacks enterprise capabilities like security and high availability.4. There is a shortage of data science skills.5. Performance lags behind the need for immediate response times.Big Data 2.0: Finally, Transformational ValueWe are experiencing the next wave of big data—Big Data 2.0. This new wave is all about releasing data's massive value from underwater. To move quickly to the transition, expect some major shifts on the technology front to give companies the ability to move quickly, turn on a dime, and stay ahead of the competition, even in the face of relentless change.“THERE IS NO ONE SINGLE PLATFORM THAT CAN HANDLE ALL THE WORKLOADS NECESSARY TO RUN TODAY'S COMPLEX ORGANIZATIONS.”One of these shifts is that cooperative processing will deliver faster time to value and better price performance. There is no one single platform that can handle all the workloads necessary to run today's complex organizations. When companies implement Hadoop for data management, traditional databases for operational workloads, and high performance analytical databases for predictive and prescriptive workloads, they will naturally drive down the cost of computing and accelerate time to value.Coming hand in hand with this trend is that organizations will start following the market leaders and move their processing to the data, not vice-versa. Until recently, only massively parallel analytics databases were able to move the processing to where the data lives, right on each node. With the introduction of YARN (yet another resource negotiator) introduced as a part of Hadoop 2.0, we can run sophisticated data transformations and complex analytics directly on HDFS (Hadoop Distributed File System).Another shift is the new analytical building blocks that will provide accessibility for non-skilled workers. It is a fact that we are stuck with a shortage of data scientists, but new analytic functions delivered through visual interfaces or embedded in databases become building blocks accessible to anyone via simple SQL calls or a drag-and-drop workbench. Expect the complexity of data types and platforms to be extracted away by extended optimizers, common metadata, and accessible visual frameworks.Lastly, new, unified platforms will provide modular approaches spanning the entire analytic process from connecting and preparing data all the way to analyzing it. These new platforms will deploy in modules, growing by function, node, or data storage volume. This approach will give way to adaptive infrastructure, where nodes can be deployed on one platform and quickly spun down and spun up to support a workload increase on another platform.So, what does all of this mean for executives who have invested in big data technology or who have an investment in their 2014 plans? This combination of capabilities being deployed in Big Data 2.0 should yield better price performance, faster time to value, continuous organizational transformation, and finally, an unfair advantage over the competition. The trail was blazed by 1.0, and the major players are finally using their data to act. Big Data 2.0 means a lot of things, but more than anything, it is a plan for leveraging untapped opportunity.Disclosure StatementThis column is sponsored by Actian.FiguresReferencesRelatedDetailsCited byA quantitative and text-based characterization of big data researchJournal of Intelligent & Fuzzy Systems, Vol. 36, No. 5 Volume 2Issue 1Mar 2014 InformationCopyright 2014, Mary Ann Liebert, Inc.To cite this article:Mike Hoskins.Big Data 2.0: Cataclysm or Catalyst?.Big Data.Mar 2014.5-6.http://doi.org/10.1089/big.2014.1519creative commons licensePublished in Volume: 2 Issue 1: March 14, 2014PDF download

Yet Another Resource Negotiator Research Articles

Related Topics

Articles published on Yet Another Resource Negotiator

An Optimized IoT-enabled Big Data Analytics Architecture for Edge-Cloud Computing.

Geometric-thermal error control system for gear profile grinding machine

Smart teledentistry healthcare architecture for medical big data analysis using IoT-enabled environment

Apache Hadoop-MapReduce on YARN framework latency

Fine-tuning Resource Allocation of Apache Spark Distributed Multinode Cluster for Faster Processing of Network-trace Data

A New Data Layout Scheme for Energy-Efficient MapReduce Processing Tasks

Determination of Business Intelligent Using Micro Financial Analysis of Tamilnadu SME

Big-Data Mechanism and Energy-Policy Design

Application of feedback control on the cloud-based web simulation system

In-Memory Parallel Processing of Massive Remotely Sensed Data Using an Apache Spark on Hadoop YARN Model

Performance evaluation of job schedulers on Hadoop YARN

Advanced resource management with access control for multitenant Hadoop

GeoProcessing Workflow Models for Distributed Processing Frameworks

Big Data Analysis with Dataset Scaling in Yet Another Resource Negotiator (YARN)

Big Data 2.0: Cataclysm or Catalyst?

Lead the way for us